Overview

This report provides an evaluation of Experimental Target forecasts of confirmed Influenza Hospitalization Admissions for the 2021-2022 and 2022-2023 season to the CDC FluSight Forecast Hub. Experimental target submissions began on December 12, 2022. The purpose of introducing an additional experimental target is to provide an opportunity for forecasting teams to submit forecasts for increasing and decreasing activity. The experimental target, named “2 wk flu hosp rate change” will be submitted as estimates of the probability of occurrence for each rate change category. Guidance for submission of the experimental target is shown here.

In additional to the models submitted to github repository for CDC FluSight hospitalization data-experimental for the 2022-2023 season, models from CDC FluSight hospitalization data-forecasts, that did not submit experimental forecasts were added to this evaluation for both the 2021-2022 and 2022-2023 season. We used a version of this code to convert these into experimental target forecasts.

This report evaluates experimental forecasts at the state and national level for confirmed Influenza Hospitalization Admissions for the 2021-2022 and 2022-2023 season. Data from the HealthData is used as ground truth data for evaluating the forecasts.

We evaluate models based on the Brier Score. To account for the variation in difficult of forecasting different weeks and locations, a pairwise approach was used to calculate the adjusted relative Brier score. Models with relative scores lower than 1 have been more accurate than the baseline on average, whereas relative scores greater than 1 indicate less accuracy than baseline on average. The Flusight-baseline model is used for the baseline in the pairwise comparison.

Results

FY 2021-2022 Flu Season

Summary Tables

The table evaluates forecast models based on the relative Brier Score, aggregated across weeks and location.

Inclusion criteria for each column are detailed below the table.

The models included have submitted at least 50% of forecasts during this time, where one forecast is a location, target, forecast date combination. The data are initially ordered by model based on their relative Brier score aggregated across time and location, with the most accurate models at the top.

  • Column 2 lists the number of forecasts a team has submitted.
  • Column 3 shows the Brier score
  • Column 4 calculates the adjusted relative Brier score, using “Flusight-baseline” model as baseline for pairwise comparison.

Evaluation by Week

All models

In the following figures, we have evaluated models across multiple forecasting weeks. Points included in this comparison are for all models that have submitted experimental target forecasts. The models in the legend with a dot and line have scores for every week. The models with just a line are missing scores for at least one week.

Brier score is used as a metric. The figure shows the mean Brier score across all locations for submission weeks.

Models that submitted at least 50% of forecasts

The models included have submitted at least 50% of forecasts during this time, where one forecast is a location, target, forecast date combination.

In the following figures, we have evaluated models across multiple forecasting weeks. Points included in this comparison are for all models that have submitted experimental target forecasts. The models in the legend with a dot and line have scores for every week. The models with just a line are missing scores for at least one week.

Brier score is used as a metric. The figure shows the mean Brier score across all locations for submission weeks.

FY 2022-2023 Flu Season

Summary Tables

The table evaluates forecast models based on the relative Brier Score, aggregated across weeks and location.

Inclusion criteria for each column are detailed below the table.

The models included have submitted at least 50% of forecasts during this time, where one forecast is a location, target, forecast date combination. The data are initially ordered by model based on their relative Brier score aggregated across time and location, with the most accurate models at the top.

  • Column 2 lists the number of forecasts a team has submitted.
  • Column 3 shows the Brier score
  • Column 4 calculates the adjusted relative Brier score, using “Flusight-baseline” model as baseline for pairwise comparison.

Evaluation by Week

All models

In the following figures, we have evaluated models across multiple forecasting weeks. Points included in this comparison are for all models that have submitted experimental target forecasts. The models in the legend with a dot and line have scores for every week. The models with just a line are missing scores for at least one week.

Brier score is used as a metric. The figure shows the mean Brier score across all locations for submission weeks.

Models that submitted at least 50% of forecasts

The models included have submitted at least 50% of forecasts during this time, where one forecast is a location, target, forecast date combination.

In the following figures, we have evaluated models across multiple forecasting weeks. Points included in this comparison are for all models that have submitted experimental target forecasts. The models in the legend with a dot and line have scores for every week. The models with just a line are missing scores for at least one week.

Brier score is used as a metric. The figure shows the mean Brier score across all locations for submission weeks.

Evaluation Periods

This figure shows the number of confirmed Influenza Hospitalization Admissions reported each week in the US for the 2022-2023 season. .